Overview

Brought to you by YData

Dataset statistics

Number of variables4
Number of observations99441
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.0 MiB
Average record size in memory32.0 B

Variable types

Text2
Numeric1
Categorical1

Alerts

customer_state is highly overall correlated with customer_zip_code_prefixHigh correlation
customer_zip_code_prefix is highly overall correlated with customer_stateHigh correlation
customer_id has unique values Unique

Reproduction

Analysis started2024-11-21 11:10:40.697541
Analysis finished2024-11-21 11:14:02.494630
Duration3 minutes and 21.8 seconds
Software versionydata-profiling vv4.12.0
Download configurationconfig.json

Variables

customer_id
Text

Unique 

Distinct99441
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size777.0 KiB
2024-11-21T16:44:02.903803image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters3182112
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique99441 ?
Unique (%)100.0%

Sample

1st row06b8999e2fba1a1fbc88172c00ba8bc7
2nd row18955e83d337fd6b2def6b18a428ac77
3rd row4e7b3e00288586ebd08712fdd0374a03
4th rowb2b6027bc5c5109e529d4dc6358b12c3
5th row4f2d8ab171c80ec8364f7c12e35b23ad
ValueCountFrequency (%)
06b8999e2fba1a1fbc88172c00ba8bc7 1
 
< 0.1%
4d27341acd30a36bca39008ee9bb9050 1
 
< 0.1%
b2b6027bc5c5109e529d4dc6358b12c3 1
 
< 0.1%
4f2d8ab171c80ec8364f7c12e35b23ad 1
 
< 0.1%
879864dab9bc3047522c92c82e1212b8 1
 
< 0.1%
fd826e7cf63160e536e0908c76c3f441 1
 
< 0.1%
5e274e7a0c3809e14aba7ad5aae0d407 1
 
< 0.1%
5adf08e34b2e993982a47070956c5c65 1
 
< 0.1%
4b7139f34592b3a31687243a302fa75b 1
 
< 0.1%
9fb35e4ed6f0a14a4977cd9aea4042bb 1
 
< 0.1%
Other values (99431) 99431
> 99.9%
2024-11-21T16:44:03.345739image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 199366
 
6.3%
f 199255
 
6.3%
2 199235
 
6.3%
c 199193
 
6.3%
1 199150
 
6.3%
b 199137
 
6.3%
8 199094
 
6.3%
3 199061
 
6.3%
7 198923
 
6.3%
6 198760
 
6.2%
Other values (6) 1190938
37.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1988533
62.5%
Lowercase Letter 1193579
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 199366
10.0%
2 199235
10.0%
1 199150
10.0%
8 199094
10.0%
3 199061
10.0%
7 198923
10.0%
6 198760
10.0%
9 198689
10.0%
0 198310
10.0%
4 197945
10.0%
Lowercase Letter
ValueCountFrequency (%)
f 199255
16.7%
c 199193
16.7%
b 199137
16.7%
e 198713
16.6%
a 198646
16.6%
d 198635
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common 1988533
62.5%
Latin 1193579
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
5 199366
10.0%
2 199235
10.0%
1 199150
10.0%
8 199094
10.0%
3 199061
10.0%
7 198923
10.0%
6 198760
10.0%
9 198689
10.0%
0 198310
10.0%
4 197945
10.0%
Latin
ValueCountFrequency (%)
f 199255
16.7%
c 199193
16.7%
b 199137
16.7%
e 198713
16.6%
a 198646
16.6%
d 198635
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3182112
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 199366
 
6.3%
f 199255
 
6.3%
2 199235
 
6.3%
c 199193
 
6.3%
1 199150
 
6.3%
b 199137
 
6.3%
8 199094
 
6.3%
3 199061
 
6.3%
7 198923
 
6.3%
6 198760
 
6.2%
Other values (6) 1190938
37.4%

customer_zip_code_prefix
Real number (ℝ)

High correlation 

Distinct14994
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35137.475
Minimum1003
Maximum99990
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size777.0 KiB
2024-11-21T16:44:03.520331image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1003
5-th percentile3315
Q111347
median24416
Q358900
95-th percentile90550
Maximum99990
Range98987
Interquartile range (IQR)47553

Descriptive statistics

Standard deviation29797.939
Coefficient of variation (CV)0.84803872
Kurtosis-0.78820393
Mean35137.475
Median Absolute Deviation (MAD)16386
Skewness0.77902506
Sum3.4941056 × 109
Variance8.8791717 × 108
MonotonicityNot monotonic
2024-11-21T16:44:03.695915image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22790 142
 
0.1%
24220 124
 
0.1%
22793 121
 
0.1%
24230 117
 
0.1%
22775 110
 
0.1%
29101 101
 
0.1%
13212 95
 
0.1%
35162 93
 
0.1%
22631 89
 
0.1%
38400 87
 
0.1%
Other values (14984) 98362
98.9%
ValueCountFrequency (%)
1003 1
 
< 0.1%
1004 2
 
< 0.1%
1005 6
< 0.1%
1006 2
 
< 0.1%
1007 4
< 0.1%
1008 4
< 0.1%
1009 7
< 0.1%
1011 5
< 0.1%
1012 3
< 0.1%
1013 3
< 0.1%
ValueCountFrequency (%)
99990 1
 
< 0.1%
99980 2
 
< 0.1%
99970 1
 
< 0.1%
99965 2
 
< 0.1%
99960 2
 
< 0.1%
99955 3
 
< 0.1%
99950 9
< 0.1%
99940 2
 
< 0.1%
99930 5
< 0.1%
99925 1
 
< 0.1%
Distinct4119
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size777.0 KiB
2024-11-21T16:44:03.996987image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length32
Median length27
Mean length10.344466
Min length3

Characters and Unicode

Total characters1028664
Distinct characters31
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1144 ?
Unique (%)1.2%

Sample

1st rowfranca
2nd rowsao bernardo do campo
3rd rowsao paulo
4th rowmogi das cruzes
5th rowcampinas
ValueCountFrequency (%)
sao 21050
 
12.1%
paulo 15606
 
9.0%
de 9684
 
5.6%
rio 8278
 
4.7%
janeiro 6882
 
3.9%
do 4276
 
2.5%
belo 2833
 
1.6%
horizonte 2798
 
1.6%
brasilia 2140
 
1.2%
porto 1648
 
0.9%
Other values (3285) 99118
56.9%
2024-11-21T16:44:04.456780image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 169618
16.5%
o 126534
12.3%
i 78754
 
7.7%
r 76497
 
7.4%
74872
 
7.3%
e 67028
 
6.5%
s 62903
 
6.1%
n 45721
 
4.4%
u 44917
 
4.4%
l 44815
 
4.4%
Other values (21) 237005
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 953332
92.7%
Space Separator 74872
 
7.3%
Dash Punctuation 232
 
< 0.1%
Other Punctuation 226
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 169618
17.8%
o 126534
13.3%
i 78754
 
8.3%
r 76497
 
8.0%
e 67028
 
7.0%
s 62903
 
6.6%
n 45721
 
4.8%
u 44917
 
4.7%
l 44815
 
4.7%
p 37119
 
3.9%
Other values (16) 199426
20.9%
Decimal Number
ValueCountFrequency (%)
1 1
50.0%
4 1
50.0%
Space Separator
ValueCountFrequency (%)
74872
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 232
100.0%
Other Punctuation
ValueCountFrequency (%)
' 226
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 953332
92.7%
Common 75332
 
7.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 169618
17.8%
o 126534
13.3%
i 78754
 
8.3%
r 76497
 
8.0%
e 67028
 
7.0%
s 62903
 
6.6%
n 45721
 
4.8%
u 44917
 
4.7%
l 44815
 
4.7%
p 37119
 
3.9%
Other values (16) 199426
20.9%
Common
ValueCountFrequency (%)
74872
99.4%
- 232
 
0.3%
' 226
 
0.3%
1 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1028664
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 169618
16.5%
o 126534
12.3%
i 78754
 
7.7%
r 76497
 
7.4%
74872
 
7.3%
e 67028
 
6.5%
s 62903
 
6.1%
n 45721
 
4.4%
u 44917
 
4.4%
l 44815
 
4.4%
Other values (21) 237005
23.0%

customer_state
Categorical

High correlation 

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size777.0 KiB
SP
41746 
RJ
12852 
MG
11635 
RS
5466 
PR
5045 
Other values (22)
22697 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters198882
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSP
2nd rowSP
3rd rowSP
4th rowSP
5th rowSP

Common Values

ValueCountFrequency (%)
SP 41746
42.0%
RJ 12852
 
12.9%
MG 11635
 
11.7%
RS 5466
 
5.5%
PR 5045
 
5.1%
SC 3637
 
3.7%
BA 3380
 
3.4%
DF 2140
 
2.2%
ES 2033
 
2.0%
GO 2020
 
2.0%
Other values (17) 9487
 
9.5%

Length

2024-11-21T16:44:04.609575image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sp 41746
42.0%
rj 12852
 
12.9%
mg 11635
 
11.7%
rs 5466
 
5.5%
pr 5045
 
5.1%
sc 3637
 
3.7%
ba 3380
 
3.4%
df 2140
 
2.2%
es 2033
 
2.0%
go 2020
 
2.0%
Other values (17) 9487
 
9.5%

Most occurring characters

ValueCountFrequency (%)
S 53947
27.1%
P 50517
25.4%
R 24193
12.2%
M 14152
 
7.1%
G 13655
 
6.9%
J 12852
 
6.5%
A 5812
 
2.9%
E 5371
 
2.7%
C 5054
 
2.5%
B 3916
 
2.0%
Other values (7) 9413
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 198882
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 53947
27.1%
P 50517
25.4%
R 24193
12.2%
M 14152
 
7.1%
G 13655
 
6.9%
J 12852
 
6.5%
A 5812
 
2.9%
E 5371
 
2.7%
C 5054
 
2.5%
B 3916
 
2.0%
Other values (7) 9413
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 198882
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 53947
27.1%
P 50517
25.4%
R 24193
12.2%
M 14152
 
7.1%
G 13655
 
6.9%
J 12852
 
6.5%
A 5812
 
2.9%
E 5371
 
2.7%
C 5054
 
2.5%
B 3916
 
2.0%
Other values (7) 9413
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 198882
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 53947
27.1%
P 50517
25.4%
R 24193
12.2%
M 14152
 
7.1%
G 13655
 
6.9%
J 12852
 
6.5%
A 5812
 
2.9%
E 5371
 
2.7%
C 5054
 
2.5%
B 3916
 
2.0%
Other values (7) 9413
 
4.7%

Interactions

2024-11-21T16:40:44.074339image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2024-11-21T16:44:04.695879image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
customer_statecustomer_zip_code_prefix
customer_state1.0000.922
customer_zip_code_prefix0.9221.000

Missing values

2024-11-21T16:44:02.003593image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-11-21T16:44:02.168238image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

customer_idcustomer_zip_code_prefixcustomer_citycustomer_state
006b8999e2fba1a1fbc88172c00ba8bc714409francaSP
118955e83d337fd6b2def6b18a428ac779790sao bernardo do campoSP
24e7b3e00288586ebd08712fdd0374a031151sao pauloSP
3b2b6027bc5c5109e529d4dc6358b12c38775mogi das cruzesSP
44f2d8ab171c80ec8364f7c12e35b23ad13056campinasSP
5879864dab9bc3047522c92c82e1212b889254jaragua do sulSC
6fd826e7cf63160e536e0908c76c3f4414534sao pauloSP
75e274e7a0c3809e14aba7ad5aae0d40735182timoteoMG
85adf08e34b2e993982a47070956c5c6581560curitibaPR
94b7139f34592b3a31687243a302fa75b30575belo horizonteMG
customer_idcustomer_zip_code_prefixcustomer_citycustomer_state
99431be842c57a8c5a62e9585dd72f22b633899150marauRS
99432f255d679c7c86c24ef4861320d5b767513500rio claroSP
9943314308d2303a3e2bdf4939b86c46d267966033belemPA
99434f5a0b560f9e9427792a88bec977102127790cajamarSP
994357fe2e80252a9ea476f950ae8f85b0f8f35500divinopolisMG
9943617ddf5dd5d51696bb3d7c6291687be6f3937sao pauloSP
99437e7b71a9017aa05c9a7fd292d714858e86764taboao da serraSP
994385e28dfe12db7fb50a4b2f691faecea5e60115fortalezaCE
9943956b18e2166679b8a959d72dd06da27f992120canoasRS
99440274fa6071e5e17fe303b9748641082c86703cotiaSP